Generalized LR Parsing in Haskell
نویسنده
چکیده
Parser combinators elegantly and concisely model generalised LL parsers in a purely functional language. They nicely illustrate the concepts of higherorder functions, polymorphic functions and lazy evaluation. Indeed, parser combinators are often presented as a motivating example for functional programming. Generalised LL, however, has an important drawback: it does not handle (direct nor indirect) left recursive context-free grammars. In a different context, the (non-functional) parsing community has been doing a considerable amount of work on generalised LR parsing. Such parsers handle virtually any context-free grammar. Surprisingly, little work has been done on generalised LR by the functional programming community ( [9] is a good exception). In this report, we present a concise and elegant implementation of an incremental generalised LR parser generator and interpreter in Haskell. For good computational complexity, such parsers rely heavily on lazy evaluation. Incremental evaluation is obtained via function memoisation. An implementation of our generalised LR parser generator is available as the HaGLR tool. We assess the performance of this tool with some benchmark examples. 1 Motivation The Generalised LR parsing algorithm was first introduced by Tomita [14] in the context of natural language processing. Several improvements have been proposed in the literature, among others to handle context-free grammars with hidden left-recursion [11, 7]. An improved algorithm has been implemented in scannerless form in the SGLR parser [3]. More recently, GLR capabilities have been added to Yacc-like tools, such as Bison. The advantage of GLR over LR is compositionality. When adding rules to grammars, or taking the union of several grammars, the LR parsing algorithm will stumble into conflicts. These conflicts must be eliminated by massaging the augmented grammar before parsing can proceed. This massaging effort is a pain by itself, but even more so because of its effect on the associated semantic functionality. It precludes as-is reuse of AST processing components developed for the initial grammar. GLR requires no such grammar changes. It tolerates conflicts, just forking off alternative parsers as necessary. These alternative parsers will either be killed off when they run into parse errors or merged when they converge to the same state. In the end, a single parser may survive, indicating a non-ambiguous global parse. When several parsers survive, additional disambiguation effort is needed to select a parse tree from the resulting parse forest. GLR’s performance can be lower than LR’s, but not too dramatically so [11]. In the context of Haskell, two general approaches to parsing are in vogue. One approach is offered by the Happy parser generator which produces bottom-up parsers in Haskell from grammar definitions, much like Yacc does for C. Like Yacc, Happy is resctricted to LALR parsing, thus lacking compositionality as explained above. The other approach is offered by several libraries of parser combinators. With these, top-down parsers can be constructed directly in Haskell. The main disadvantages of this approach, and of LL parsing in general, is that it fails to terminate on left-recursive rules. To eliminate left-recursion, the LL parser developer is forced to massage his grammar, which requires quite some effort, and makes the grammar less natural. Given the above considerations, it is natural to long for GLR support in Haskell. One approach would be to extend Happy from LALR to GLR. For a more in-depth analysis of the draw-backs of traditional parsing methods, we refer the reader to [4]. In earlier work, one of the authors provides GLR support for Haskell in a not fully integrated fashion, c.q. by invoking an external GLR parser [5, 8]. During the development of this project, such an extension was pre-announced in the
منابع مشابه
Layout-Sensitive Generalized Parsing
The theory of context-free languages is well-understood and context-free parsers can be used as off-the-shelf tools in practice. In particular, to use a context-free parser framework, a user does not need to understand its internals but can specify a language declaratively as a grammar. However, many languages in practice are not context-free. One particularly important class of such languages ...
متن کاملHASDF: A Generalized LR-parser Generator for Haskell
Language-centered software engineering requires language technology that (i) handles the full class of context-free grammars, and (ii) accepts grammars that contain syntactic information only. The syntax definition formalism SDF combined with GLR-parser generation offers such technology. We propose to make SDF and GLR-parsing available for use with various programming languages. We have done so...
متن کاملA New Parallel Algorithm for Generalized LR Parsing
Tomita's parsing algorithm [~Ibmita 86], which adapted the LR parsing algorithm to context fl'ee grammars, makes use of a breadth-first strategy to handle LR table conflicts. As the breadth-first strategy is compatible with parallel processing, we can easily develop a parallel generalized LR parser b~ed on Tomita's algorithm [Tanaka 89]. However, there is a problem in that this algorithm synchr...
متن کاملFaster Generalized LR Parsing
Tomita devised a method of generalized LR (GLR) parsing to parse ambiguous grammars e ciently. A GLR parser uses linear-time LR parsing techniques as long as possible, falling back on more expensive general techniques when necessary. Much research has addressed speeding up LR parsers. However, we argue that this previous work is not transferable to GLR parsers. Instead, we speed up LR parsers b...
متن کاملA New Approach to the Construction of Generalized LR Parsing Algorithms
LR parsing strategies can analyze LR grammars, which are deterministic. If we consider LR parsing tables in which each entry can contain several actions, we obtain non-deterministic LR parsing, often known as generalized LR parsing, which can analyze non-deterministic context-free grammars. It this context, some mechanism is needed in order to represent the non-deterministic evolution of the st...
متن کامل